63 research outputs found

    Approximating Word Ranking and Negative Sampling for Word Embedding

    Get PDF
    CBOW (Continuous Bag-Of-Words) is one of the most commonly used techniques to generate word embeddings in various NLP tasks. However, it fails to reach the optimal performance due to uniform involvements of positive words and a simple sampling distribution of negative words. To resolve these issues, we propose OptRank to optimize word ranking and approximate negative sampling for bettering word embedding. Specifically, we first formalize word embedding as a ranking problem. Then, we weigh the positive words by their ranks such that highly ranked words have more importance, and adopt a dynamic sampling strategy to select informative negative words. In addition, an approximation method is designed to efficiently compute word ranks. Empirical experiments show that OptRank consistently outperforms its counterparts on a benchmark dataset with different sampling scales, especially when the sampled subset is small. The code and datasets can be obtained from https://github.com/ouououououou/OptRank

    BoostFM: Boosted Factorization Machines for Top-N Feature-based Recommendation

    Get PDF
    Feature-based matrix factorization techniques such as Factorization Machines (FM) have been proven to achieve impressive accuracy for the rating prediction task. However, most common recommendation scenarios are formulated as a top-N item ranking problem with implicit feedback (e.g., clicks, purchases)rather than explicit ratings. To address this problem, with both implicit feedback and feature information, we propose a feature-based collaborative boosting recommender called BoostFM, which integrates boosting into factorization models during the process of item ranking. Specifically, BoostFM is an adaptive boosting framework that linearly combines multiple homogeneous component recommenders, which are repeatedly constructed on the basis of the individual FM model by a re-weighting scheme. Two ways are proposed to efficiently train the component recommenders from the perspectives of both pairwise and listwise Learning-to-Rank (L2R). The properties of our proposed method are empirically studied on three real-world datasets. The experimental results show that BoostFM outperforms a number of state-of-the-art approaches for top-N recommendation

    LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates

    Get PDF
    State-of-the-art item recommendation algorithms, which apply Factorization Machines (FM) as a scoring function and pairwise ranking loss as a trainer (PRFM for short), have been recently investigated for the implicit feedback based context-aware recommendation problem (IFCAR). However, good recommenders particularly emphasize on the accuracy near the top of the ranked list, and typical pairwise loss functions might not match well with such a requirement. In this paper, we demonstrate, both theoretically and empirically, PRFM models usually lead to non-optimal item recommendation results due to such a mismatch. Inspired by the success of LambdaRank, we introduce Lambda Factorization Machines (LambdaFM), which is particularly intended for optimizing ranking performance for IFCAR. We also point out that the original lambda function suffers from the issue of expensive computational complexity in such settings due to a large amount of unobserved feedback. Hence, instead of directly adopting the original lambda strategy, we create three effective lambda surrogates by conducting a theoretical analysis for lambda from the top-N optimization perspective. Further, we prove that the proposed lambda surrogates are generic and applicable to a large set of pairwise ranking loss functions. Experimental results demonstrate LambdaFM significantly outperforms state-of-the-art algorithms on three real-world datasets in terms of four standard ranking measures

    Joint Geo-Spatial Preference and Pairwise Ranking for Point-of-Interest Recommendation

    Get PDF
    Recommending users with preferred point-of-interests (POIs) has become an important task for location-based social networks, which facilitates users' urban exploration by helping them filter out unattractive locations. Although the influence of geographical neighborhood has been studied in the rating prediction task (i.e. regression), few work have exploited it to develop a ranking-oriented objective function to improve top-N item recommendations. To solve this task, we conduct a manual inspection on real-world datasets, and find that each individual's traits are likely to cluster around multiple centers. Hence, we propose a co-pairwise ranking model based on the assumption that users prefer to assign higher ranks to the POIs near previously rated ones. The proposed method can learn preference ordering from non-observed rating pairs, and thus can alleviate the sparsity problem of matrix factorization. Evaluation on two publicly available datasets shows that our method performs significantly better than state-of-the-art techniques for the top-N item recommendation task

    Auto-Surprise: An Automated Recommender-System (AutoRecSys) Library with Tree of Parzens Estimator (TPE) Optimization

    Full text link
    We introduce Auto-Surprise, an Automated Recommender System library. Auto-Surprise is an extension of the Surprise recommender system library and eases the algorithm selection and configuration process. Compared to out-of-the-box Surprise library, Auto-Surprise performs better when evaluated with MovieLens, Book Crossing and Jester Datasets. It may also result in the selection of an algorithm with significantly lower runtime. Compared to Surprise's grid search, Auto-Surprise performs equally well or slightly better in terms of RMSE, and is notably faster in finding the optimum hyperparameters.Comment: To be presented at RecSys '20 Fourteenth ACM Conference on Recommender Systems, September 21-26, 2020, Virtual Even

    VSE-ens: Visual-Semantic Embeddings with Efficient Negative Sampling

    Full text link
    Jointing visual-semantic embeddings (VSE) have become a research hotpot for the task of image annotation, which suffers from the issue of semantic gap, i.e., the gap between images' visual features (low-level) and labels' semantic features (high-level). This issue will be even more challenging if visual features cannot be retrieved from images, that is, when images are only denoted by numerical IDs as given in some real datasets. The typical way of existing VSE methods is to perform a uniform sampling method for negative examples that violate the ranking order against positive examples, which requires a time-consuming search in the whole label space. In this paper, we propose a fast adaptive negative sampler that can work well in the settings of no figure pixels available. Our sampling strategy is to choose the negative examples that are most likely to meet the requirements of violation according to the latent factors of images. In this way, our approach can linearly scale up to large datasets. The experiments demonstrate that our approach converges 5.02x faster than the state-of-the-art approaches on OpenImages, 2.5x on IAPR-TCI2 and 2.06x on NUS-WIDE datasets, as well as better ranking accuracy across datasets.Comment: Published by The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18
    • …
    corecore